COBRAS: Fast, Iterative, Active Clustering with Pairwise Constraints
نویسندگان
چکیده
Constraint-based clustering algorithms exploit background knowledge to construct clusterings that are aligned with the interests of a particular user. This background knowledge is often obtained by allowing the clustering system to pose pairwise queries to the user: should these two elements be in the same cluster or not? Active clustering methods aim to minimize the number of queries needed to obtain a good clustering by querying the most informative pairs first. Ideally, a user should be able to answer a couple of these queries, inspect the resulting clustering, and repeat these two steps until a satisfactory result is obtained. We present COBRAS, an approach to active clustering with pairwise constraints that is suited for such an interactive clustering process. A core concept in COBRAS is that of a super-instance: a local region in the data in which all instances are assumed to belong to the same cluster. COBRAS constructs such superinstances in a top-down manner to produce highquality results early on in the clustering process, and keeps refining these super-instances as more pairwise queries are given to get more detailed clusterings later on. We experimentally demonstrate that COBRAS produces good clusterings at fast run times, making it an excellent candidate for the iterative clustering scenario outlined above.
منابع مشابه
An Efficient Iterative Framework for Semi- Supervised Clustering Based Batch Sequential Active Learning Approach
Semi-supervised is the machine learning field. In the previous work, selection of pairwise constraints for semi-supervised clustering is resolved using active learning method in an iterative manner. Semi-supervised clustering derived from the pairwise constraints. The pairwise constraint depends on the two kinds of constraints such as must-link and cannot-link.In this system, enhanced iterative...
متن کاملAn Efficient Iterative Framework for Semi-supervised Clustering Based Batch Sequential Active Learning Approach
Semi-supervised is the machine learning field. In the previous work, selection of pairwise constraints for semi-supervised clustering is resolved using active learning method in an iterative manner. Semi-supervised clustering derived from the pairwise constraints. The pairwise constraint depends on the two kinds of constraints such as must-link and cannot-link.In this system, enhanced iterative...
متن کاملActive Learning of constraints using incremental approach in semi-supervised clustering
Semi-supervised clustering aims to improve clustering performance by considering user-provided side information in the form of pairwise constraints. We study the active learning problem of selecting must-link and cannot-link pairwise constraints for semi-supervised clustering. We consider active learning in an iterative framework; each iteration queries are selected based on the current cluster...
متن کاملCOBRA: A Fast and Simple Method for Active Clustering with Pairwise Constraints
Clustering is inherently ill-posed: there often exist multiple valid clusterings of a single dataset, and without any additional information a clustering system has no way of knowing which clustering it should produce. This motivates the use of constraints in clustering, as they allow users to communicate their interests to the clustering system. Active constraint-based clustering algorithms se...
متن کاملActive Query Selection and Spectral Eigenvectors Semi-Supervised Clustering
Semi-supervised clustering aims to improve clustering performance by considering user supervision in the form of pairwise constraints. In this paper, we study the active learning problem of selecting pairwise must-link and cannot-link constraints for semisupervised clustering. We consider active learning in an iterative manner where in each iteration queries are selected based on the current cl...
متن کامل